-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster Expression parser / Tokenizer with StringScanner #1844
Conversation
7538bcb
to
e912bb0
Compare
10ef505
to
8759798
Compare
49e55a6
to
e0d54cc
Compare
7419a44
to
c0e45c1
Compare
1c8460c
to
e6786f6
Compare
lib/liquid/expression.rb
Outdated
|
||
def parse_number(markup) | ||
ss = string_scanner | ||
ss.string = markup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like it might be possible to make this method faster by not using string scanner and manually iterating over the bytes in the string. It looks like markup
is a string and we're just setting it on the string scanner.
Just as a rough outline, something like this:
idx = 0
byte = markup.getbyte(idx)
return false if return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)
while (byte = markup.getbyte(idx) && idx += 1) # get the current byte, advance the cursor
return false if byte != DOT && (byte < ZERO || byte > NINE)
...
end
It's kind of a pain to manage the cursor position yourself, but we'll actually create more JIT friendly code this way. I'd recommend using StringScanner
only when we need to scan through a large buffer and use complex matches (IOW, regular expressions). We might be able to be even faster with StringScanner in the future, but for now I think managing a cursor yourself will be faster (in this case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
I agree that StringScanner
doesn't fit for this use case 👍
Unfortunately, I didn't get to see any performance improvement as much as I hoped to see... (roughly 2% diff by keeping the cursor in Ruby
(10 seconds warmup / 20 seconds run)
Baseline
ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Liquid::Expression2#parse: number
39.927k i/100ms
Calculating -------------------------------------
Liquid::Expression2#parse: number
405.480k (± 0.4%) i/s (2.47 μs/i) - 8.145M in 20.087950s
Without StringScanner
ruby 3.4.0dev (2024-04-23T16:59:11Z v3.4.0-pshopify-pr.. 70b931820e) +YJIT [arm64-darwin23]
Warming up --------------------------------------
Liquid::Expression2#parse: number
39.051k i/100ms
Calculating -------------------------------------
Liquid::Expression2#parse: number
395.787k (± 0.3%) i/s (2.53 μs/i) - 7.927M in 20.029537s
Source code
def parse_number(markup)
# the first byte must be a digit, a period, or a dash
byte = markup.getbyte(0)
return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)
is_integer = true
last_dot_pos = nil
num_end_pos = nil
idx = 1
while byte = markup.getbyte(idx)
return false if byte != DOT && (byte < ZERO || byte > NINE)
idx += 1
# we found our number and now we are just scanning the rest of the string
next if num_end_pos
if byte == DOT
if is_integer == false
num_end_pos = idx - 1
else
is_integer = false
last_dot_pos = idx
end
end
end
num_end_pos = markup.length if idx == markup.length
return markup.to_i if is_integer
if num_end_pos
# number ends with a number "123.123"
markup.byteslice(0, num_end_pos).to_f
elsif last_dot_pos
markup.byteslice(0, last_dot_pos).to_f
else
# we should never reach this point
false
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to profile to find what's slow, but I think avoiding the mutation of StringScanner
is a great improvement. Maybe try profiling with Vernier?
If I'm reading this correctly, it looks like this function requires that the whole string must contain 0-9
or a .
character. We seem to exit early if return false if byte != DOT && (byte < ZERO || byte > NINE)
happens.
If that's true, then we could do a fast path case that's like return markup.to_i if markup.match?(/\A[0-9]+\z/)
.
Something like this:
def parse_number(markup)
# the first byte must be a digit, a period, or a dash
byte = markup.getbyte(0)
return false if byte != DASH && byte != DOT && (byte < ZERO || byte > NINE)
# This handles all integer cases
return markup.to_i if markup.match?(/\A[0-9]+\z/)
# rest of method
end
It might be possible to do the float parsing without the loop, but I don't really understand what it's doing. It looks like it's trying to handle a number with multiple dots in it? I don't understand the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's trying to handle a number with multiple dots in it? I don't understand the case.
Yup, since"123.456.789".to_f
returns123.456
in Ruby, we need to replicate that logic in Liquid.
The original Tokenizer
actually parses the number with the Regexs:
...
case markup
when INTEGERS_REGEX
Regexp.last_match(1).to_i
when RANGES_REGEX
RangeLookup.parse(Regexp.last_match(1), Regexp.last_match(2))
when FLOATS_REGEX
Regexp.last_match(1).to_f
else
...
with this approach of using Regex, we are scanning the String twice for parsing a number. (Once with INTEGERS_REGEX
and another one with FLOATS_REGEX
)
By scanning the String once, Tokenizer2
is 1.7x faster than Tokenizer1
for parsing numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, since
"123.456.789".to_f
returns123.456
in Ruby, we need to replicate that logic in Liquid.
Do we really need to replicate all logic from Ruby? For example "1e10".to_f
works in Ruby, but it's not supported here. Also, if it's following the same rules that the lexer uses, I don't think we would encounter a :number
token that contains multiple .
:
$ ruby -I lib -r liquid/lexer -e'p Liquid::Lexer.new("12.34.56").tokenize'
[[:number, "12.34"], [:dot, "."], [:number, "56"], [:end_of_string]]
Are these strings coming from the lexer or somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to replicate all logic from Ruby?
You are right, we don't always follow the Ruby's number parsing logic... 😅
So... Liquid has 2 parsing modes, lax
and strict
parsing mode. (lax
is our legacy parser)
In strict
mode, templates like {{ 123.123.123 }}
gets tokenized by Lexer#tokenize
first, and it will raise an error
Liquid syntax error: Expected end_of_string but found dot in "{{ 123.123.123 }}"
However, in lax
mode, {{ 123.123.123 }}
template gets parsed by Expression#parse
without using the Lexer
.
Since the Expression
class' FLOATS_REGEX
is /\A(-?\d[\d\.]+)\z/
, it allows multiple dots in the float :(
lib/liquid/tokenizer.rb
Outdated
|
||
class << self | ||
def string_scanner | ||
@string_scanner ||= StringScanner.new("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a problem creating a new StringScanner object each time we need one? It seems like we're just caching the object and then mutating it later. That could lead to thread safety problems down the road. Usually I like to create one StringScanner
object per buffer I'm parsing, which should amortize the cost of the allocation as long as it's reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I am also worried about thread safety problem in the future :(
In local testing, we see around ~5.6% performance improvement, but in production, we were seeing performance regression due to increased GC calls.
Also, allocating a new StringScanner
object was more expensive than I have thought 🤔
For ParseContext#parse_expression`, allocating a new StringScanner took 6.7% of the total time:
For thread safety in the future, we could create a pool of StringScanner
later 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those, who uses web servers like Puma, recycling the StringScanner
would become problematic... I just added StringScannerPool
to this PR 👍
Thank you for catching this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For ParseContext#parse_expression`, allocating a new StringScanner took 6.7% of the total time
This is a recursive descent parser though right? Meaning parse_expression
will recursively call parse_expression
? In other words, does parse_expression
get called many times per document?
Ideally we would only create one StringScanner object per document we parse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup - parse_expression
is a recursive function. The function always finish using its StringScanner
before it calls itself, so we are safe to reuse the StringScanner
.
The PR has been updated to create one StringScanner
object per document which is stored in the ParseContext
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use a pool like this. It feels like a memory leak waiting to happen. Race conditions could cause us to create many more StringScanner object than are necessary, and since we have no cap on the @ss_pool
we'll retain a bunch of string scanner objects that we don't need.
I really think we need to ensure that only one string scanner object is created per document we parse. I have a hard time believing that allocating one string scanner per document would be a significant source of overhead. Even if it is, it seems like we could refactor the tokenize
method to return fewer arrays and make up for any "lost time".
a37bdde
to
76c5f09
Compare
lib/liquid/expression.rb
Outdated
@@ -20,7 +22,7 @@ class Expression | |||
# malicious input as described in https://github.com/Shopify/liquid/issues/1357 | |||
RANGES_REGEX = /\A\(\s*(?>(\S+)\s*\.\.)\s*(\S+)\s*\)\z/ | |||
|
|||
def self.parse(markup) | |||
def self.parse(markup, _ss = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[note] Providing default StringScanner
value makes the Expression
class compatible with Liquid-C
lib/liquid/expression.rb
Outdated
INTEGER_REGEX = /\A(-?\d+)\z/ | ||
FLOAT_REGEX = /\A(-?\d+)\.\d+\z/ | ||
|
||
CACHE = LruRedux::Cache.new(10_000) # most themes would have less than 2,000 unique expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get 90% of the benefit by hardcoding a few common expressions? I'm not sure how I feel about a global cache like this given the objects aren't frozen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... I could see how it could go wrong... 😓
I have updated the PR to create an Expression cache per document parsing.
The caches do fill up quickly, and the cache hit rate stays relatively high (~60% for Dawn)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: We will have to update Liquid-C to actually provide Expression cache
Shopify/liquid-c#227
lib/liquid/context.rb
Outdated
@@ -176,7 +180,7 @@ def []=(key, value) | |||
# Example: | |||
# products == empty #=> products.empty? | |||
def [](expression) | |||
evaluate(Expression.parse(expression)) | |||
evaluate(Expression.parse(expression, @string_scanner)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: I think Context#[]
should just be deprecated honestly, it's a very deceptive performance footgun.
Instead, we can add a new method: Context#parse_and_evaluate
0e4da50
to
aa7f926
Compare
lib/liquid/parse_context.rb
Outdated
# constructing new StringScanner in Lexer, Tokenizer, etc is expensive | ||
# This StringScanner will be shared by all of them | ||
@string_scanner = StringScanner.new("") | ||
@expression_cache = LruRedux::Cache.new(10_000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this an option on the parse context
@expression_cache = LruRedux::Cache.new(10_000) | |
@expression_cache = options[:expression_cache] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also have default expression caching option.
@expression_cache = if options[:expression_cache] != false
options[:expression_cache] || LruRedux::Cache.new(10_000)
end
With this way, people can re-use their expression caches between templates or disable the expression caching.
463231b
to
d48708a
Compare
7914596
to
97cada6
Compare
Faster Tokenizer & Expression Parser
Inspired by @tenderlove 's Rails World 2024 Closing Keynote and
@ianks 's #1832 PR ❤️
This PR updates the
Expression
Parser andTokenizer
to use the newStringScanner
to improve the performance.Faster Expression Parser!
Here are the performance improvements by using
Expression2
Benchmark Improvement
main
branch (baseline)fast-expression-parse
branchThis PR improves the parsing speed by ~24%, and parse & render speed by ~19%.
More Benchmark Results!
Here is another benchmarking with a larger and complex theme:
By using the LRU caching, Liquid is creating ~20% less objects, the parsing time is ~18% faster, and render time is ~9% faster.
With production traffic, we are seeing ~14% parsing time improvement: